Association Between Nominal Categorical Variables: New Measure Formulation Based on Metric Distances and Value Validity

نویسندگان

چکیده

Abstract When dealing with nominal categorical data, it is often desirable to know the degree of association or dependence between variables. While there literally no limit number alternative measures that have been proposed over years, they all yield greatly varying, contradictory, and unreliable results due their lack an important property: value validity. After discussing value-validity property, this paper introduces a new measure (dependence) based on mean Euclidean distance probability distributions, one being distribution under independence. Both asymmetric form, when variable can be considered as explanatory (independent) response (dependent) variable, symmetric form are introduced. Particular emphasis given 2 × case each has two categories, but general any categories also covered. Besides having prerequisites good measure. Comparisons made well-known Goodman–Kruskal lambda tau measures. Statistical inference procedure for derived numerical examples provided.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Testing for Association between Categorical Variables with Multiple-Response Data

Some survey questions provide respondents with a list of possible answers and instructions to "mark all that apply." This paper presents methods for determining when the responses to a mark-allthatapply question are associated with the responses to a standard question whose answers fall in one of several mutually exclusive categories. Such data originate from many sources including large-scale ...

متن کامل

An association-based dissimilarity measure for categorical data

In this paper, we propose a novel method to measure the dissimilarity of categorical data. The key idea is to consider the dissimilarity between two categorical values of an attribute as a combination of dissimilarities between the conditional probability distributions of other attributes given these two values. Experiments with real data show that our dissimilarity estimation method improves t...

متن کامل

task-based language teaching in iran: a mixed study through constructing and validating a new questionnaire based on theoretical, sociocultural, and educational frameworks

جنبه های گوناگونی از زندگی در ایران را از جمله سبک زندگی، علم و امکانات فنی و تکنولوژیکی می توان کم یا بیش وارداتی در نظر گرفت. زبان انگلیسی و روش تدریس آن نیز از این قاعده مثتسنی نیست. با این حال گاهی سوال پیش می آید که آیا یک روش خاص با زیر ساخت های نظری، فرهنگی اجتماعی و آموزشی جامعه ایرانی سازگاری دارد یا خیر. این تحقیق بر اساس روش های ترکیبی انجام شده است.پرسش نامه ای نیز برای زبان آموزان ...

New Filter method for categorical variables’ selection

It is worth noting that the variable-selection process has become an increasingly exciting challenge, given the dramatic increase in the size of databases and the number of variables to be explored and modelized. Therefore, several strategies and methods have been developed with the aim of selecting the minimum number of variables while preserving as much information for the interest variable o...

متن کامل

The Similarity for Nominal Variables Based on F-Divergence

Measuring the similarity between nominal variables is an important problem in data mining. It's the base to measure the similarity of data objects which contain nominal variables. There are two kinds of traditional methods for this task, the first one simply distinguish variables by same or not same while the second one measures the similarity based on co-occurrence with variables of other attr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of statistical theory and practice

سال: 2023

ISSN: ['1559-8616', '1559-8608']

DOI: https://doi.org/10.1007/s42519-023-00344-5